Random variable, stochastic process and simulation.

import numpy as np
import scipy as sp
import seaborn as sns

import matplotlib.pyplot as plt
plt.style.use('fast')

from numpy.random import normal, choice, uniform

%matplotlib inline
%config InlineBackend.figure_format='retina'

Random numbers, probability distribution and stat analysis

  • The numpy.random has the fastest random number generators that are based on low level code written in C.

  • The Scipy.stats has an extensive library of statistical distributions and tools for statistical analysis.

  • The Statsmodels Enhancing Scipy functionality with more tools

  • The Seaborn library that enhances matplotlib functionality for stat visualization.

General overview of random numbers in python

First we take a look at most widely used random numbers of numpy also called standard random numbers. These are rand (uniform random number on interval 0,1) and randn (stnadard normal random number with 0 mean and 1 variance).

  • When running code that uses random numbers results will always be different for every run. If you want code to reproduce same result you can fix the seed to get reproducible results: np.random.seed(8376743)

\[\begin{split}f(x)=\begin{cases} {\frac {1}{b-a}}&\mathrm {for} \ a\leq x\leq b,\\[8pt]0&\mathrm {for} \ x<a\ \mathrm {or} \ x>b \end{cases} \end{split}\]
# Generates standard uniform random numbers U(0,1)
r = np.random.rand(10) 

r
array([0.62141382, 0.19713605, 0.17467063, 0.6768546 , 0.26894241,
       0.11121689, 0.32833678, 0.69532591, 0.03252038, 0.9487097 ])
fig, ax = plt.subplots(nrows=2)

r = np.random.rand(200) 

ax[0].plot(r,  color='blue', label='trajectory')

ax[1].hist(r,  color='red',  label = 'histogram')

fig.legend();
../_images/02_RandVar_7_0.png

In the same way we generate and visualize norally distributed random numbers, \(N(0,1)\)

\[P(x |\mu=0, \sigma=1) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
fig, ax = plt.subplots(nrows=2)

# Gene
r = np.random.randn(200) 

ax[0].plot(r,  color='blue', label='trajectory')

ax[1].hist(r,  color='red',  label = 'histogram')

fig.legend();
../_images/02_RandVar_9_0.png
  • We showed examples of standard random numbers \(U(0,1)\) and \(N(0,1)\)

  • Parametraized random number allow you to set parameters like value of mean and interval length and therefore can be viewed as generalized versions of standard random numbers. Below we take a look at examples of continuous (random and unofrm RVs) and discrete random numbers generated by paratmerized distributions.

np.random.uniform(low=-1, high=1, size=(3, 4))
array([[-0.44977127,  0.88320299,  0.77695608, -0.58998276],
       [ 0.22864754,  0.58243645, -0.80661315,  0.47595857],
       [ 0.32216997,  0.00646382,  0.36900723,  0.31086749]])
np.random.normal(loc=8, scale=10, size=(4, 4))
array([[ -0.13798444,  -1.56815622,  -4.28691952,   9.64229612],
       [ 13.29574973,  13.12127879,  25.25019551,  -1.68066281],
       [  5.48737843, -17.42952077,   4.53849395,  21.93517642],
       [ 10.44381736,  18.94291879,   1.58663005,   1.699577  ]])
np.random.binomial(n=10, p=0.6, size=20) # This one is Binomial distributions. You can see it is discrete as  exepcted. 
array([7, 9, 5, 7, 6, 7, 8, 6, 4, 3, 8, 5, 9, 6, 8, 8, 6, 8, 5, 4])

Using random numbers to get answers via simulations

One of the major uses of random numbers is for conducting numerical simulations. What is a simulation? It is a recreation of a process on a computer. And this recreation is done by random numbers. E.g to simulate coint tosses, die throws, diffusion of molecules, conformational change of polymers we use random number to recreate the process on a computer. Let’s start off by asking some simple questions

  • How often do we get a run of 5 or more consecutive heads in 100 coin tosses if we repeat the experiment 1000 times?

  • What if the coin is biased to generate heads only 40% of the time?

L = 100    # length of each trajectory
N = 1      # number of experiments: stochastic trajecotries generated

xs = np.random.choice([0,1], (L, N)) # (i) Unbiased coin p=[0.5,0.5] by default

ys = np.random.choice([0,1], (L, N), p=[0.9, 0.1])  # (ii) biased coin
fig, ax = plt.subplots(nrows=2)

r = np.random.randn(200) 

ax[0].plot(xs, 'o', alpha=0.5, label  =  'data1')
ax[0].plot(ys, 'o', alpha=0.5, label =  'data2')

ax[1].hist((xs[:,0], ys[:,0]), 2, label = ("data1", "data2"))

fig.legend();
../_images/02_RandVar_17_0.png

Simple, discrete, unbiased random walk in 1D.

Let’s start off by simulating a random walk outlined in class by using a random number generators from numpy called choice and normal. Type help(np.random.choice) and help(np.random.normal) to learn more.

numpy.random.choice(A,size=(M,N,K)) 

returns element from a list of objects, A with uniform and equal probabilities (default) or with unqeual probabilities once supplied a list. Returend array can be a 1D, 2D or 3D aray of shape M,N,K.

choice(['bagel','muffin','croissant'], size=(3,5))
array([['bagel', 'bagel', 'bagel', 'croissant', 'croissant'],
       ['bagel', 'bagel', 'croissant', 'bagel', 'muffin'],
       ['muffin', 'muffin', 'bagel', 'croissant', 'bagel']], dtype='<U9')
  • The numpy.random.normal(mu,sigma,size=(M,N,K)) retruns random number distributed according to gausian probability function in the form of 1D, 2D or 3D arays of length N,M,K

# Here we generate 3 sequences of normally distributed random variables of length 5
normal(0, 2, (3,5))
array([[ 1.98617362,  2.02608912, -2.09791641,  1.74859223, -5.79206836],
       [ 0.85353056, -2.47331011,  0.3880088 , -4.26787118, -1.65923946],
       [-1.94755653, -1.66662744,  1.3832905 , -1.04919874, -1.55330328]])

Simulating a 1D unbiased random walk

In the course of simulating random walks we will be genreating multidimensional numpy arrays. We will adhere to a convnetion that:

  • Rows are regarded as number of measurments, or samples

  • Columns are regarded as number of observables distinct measruments/trajectories

  • We then take cumulative sum over trajectory [a,b,c,…], which accumulates random walker’s position over time [a, a+b, a+b+c,…]. This is done by convenient np.cumsum() method.

def rw_1d(L, N):
    '''
    L: trajectory length
    N: Number of trajecotry
    returns np.array (L, N) shape 
    '''
    
    # Create random walks 
    r  = choice([-1,1], size=(L, N))
    
    #Accumulate position
    rw = r.cumsum(axis=0)

    #Set initial position 
    rw[0,:]=0 
    
    return rw
rw = rw_1d(2000, 1000)

print(rw.shape)
L = rw.shape[0]
N = rw.shape[1]
(2000, 1000)
fig, ax = plt.subplots(nrows=2)

ax[0].plot(rw[:, 1:20])

ax[1].hist(rw[-1, :], density=True)

#
ax[0].set_ylabel('position')
ax[0].set_title('rw trajecotries');

ax[1].set_xlabel('position')
ax[1].set_ylabel('histogram')
Text(0, 0.5, 'histogram')
../_images/02_RandVar_27_1.png

Statistics of random walk

from scipy.stats import norm 
fig, ax = plt.subplots(1, 5, figsize=(9, 3), sharey=True) 

for i in range(5):  
    
    t = int(i*(L/5)) + 5                   # record dist at 5 equidist points
    
    ax[i].hist(rw[t,:], density=True)      # generate histogram at different points
    
    xmin, xmax = ax[i].get_xlim()
    x = np.linspace(xmin, xmax, 100)
    
    y = norm.pdf(x, 0, np.sqrt(t))
    
    ax[i].plot(x,y)  
    
    ax[i].set_title(f"t = {t}")
../_images/02_RandVar_30_0.png

Connection with diffusion: Mean square displacement

\[ \langle X(t)^2 \rangle^{1/2} = \frac{1}{n}\sum_{trajectories} X_n(t)^2 \sim t^{1/2} \]
L, N = 2000, 1000
rw = rw_1d(L, N)

t = np.arange(L)
msd = np.mean(rw**2, axis=1)
rsd = msd**0.5

plt.loglog(np.arange(L), rsd, lw=3) 

plt.loglog(t, np.sqrt(t), '--')

plt.title('Compute mean square deviation of 1D random walker',fontsize=15)
plt.xlabel('Number of steps, n',fontsize=15)
plt.ylabel(r'$MSD(n)=\langle x(n)^2 \rangle^{1/2}$',fontsize=15);
../_images/02_RandVar_33_0.png

Questions

  • Compute MSD for 1D random walk show the expected scaling

  • Generate random walks from different positions

  • choice between [-1,1]

  • Multinomial choice

  • Self avoiding walks

def rw_2d(L, N):
    
    '''2d random walk function:
    L: trajectory length
    N: Number of trajecotry
    returns np.array with shape (L, N)
    '''
    verteces = np.array([(1,  0),
                         (0,  1),
                         (-1, 0),
                         (0, -1)])
    
    rw       = verteces[choice([0,1,2,3], size=(L, N))]
    
    rw[0, :, :] = 0
    
    return rw.cumsum(axis=0)
traj = rw_2d(L=10000, N=1000)
print(traj.shape)
(10000, 1000, 2)

Compute room mean square distance of a random walker

\[ \langle R(t)^2 \rangle^{1/2} = \langle (X(t)^2+Y(t)^2)\rangle^{1/2} \sim t^{1/2} \]
fig, ax  = plt.subplots(nrows=2, figsize=(10,10))

#Simulate 2D random walk
L, N = 10000, 1000
traj = rw_2d(L, N)

#Compute RSD 
t            = np.arange(L)
r2           = np.sum(traj**2, axis=2)
mean_r2      = np.mean(r2, axis = 1)
rsd          = np.sqrt(mean_r2)

ax[0].plot(traj[:3000, :5, 0], traj[:3000, :5, 1]);

ax[1].loglog(t, rsd, lw=3, alpha=0.5);
ax[1].loglog(t, t**0.5, '--');

ax[0].set_title('2D random walker',fontsize=15)
ax[0].set_xlabel('X')
ax[1].set_xlabel('Y')

ax[1].set_xlabel('Number of steps, t',fontsize=15)
ax[1].set_ylabel(r'$MSD(n)=\langle R(t)^2 \rangle^{1/2}$',fontsize=15);
../_images/02_RandVar_39_0.png

Binomial distribution of a random walker

  • In previous example we started with random variables with no reference to probability distribution. This time we will generate random numbers from a binomial distribution.

  • This time we will use scipy.stats library which contains probability distirbution functions. That is in addition to generating random variables we can also compute probability distributions and related quantities analytically.

from scipy.stats import binom, norm, poisson  
s =  binom(10, 0.5) # Let us declare s to be a binomial RV
print(s.rvs(20))          # 20 random samples form X
print(s.pmf(5))           # P(X = 5)
print(s.cdf(5))           # P(X <= 5)
print(s.mean())           # E[X], mean
print(s.var())            # Var(X), variance
print(s.std())            # Std(X), standard deviation
[7 5 3 8 4 5 6 5 5 8 7 3 5 5 6 5 3 4 6 5]
0.24609375000000003
0.623046875
5.0
2.5
1.5811388300841898
def coin_flip(L,p,N):
    
    '''
    L: flip coint L times 
    q: with p probability
    N: Repeat experiment N times
    ''' 
    coin       = binom(L, p)   # Binomial RV
    
    coin_flips = coin.rvs(N)   # Generate sample of N points
    coin_pmf   = coin.pmf(np.arange(L+1)) # Generate PMF from 0 to L 
    coin_cdf   = coin.cdf(np.arange(L+1))
    
    return coin_flips, coin_pmf, coin_cdf
fig, ax=plt.subplots(nrows=1, ncols=3,figsize=(12,3))

#Coin flip 1 time.
X1, P1, CP1 = coin_flip(20, 0.5, 80)

ax[0].plot(X1,'-',color='grey')

ax[1].hist(X1,density=True)
ax[1].plot(P1,'-o')

ax[2].plot(CP1,'-d',color='red')
[<matplotlib.lines.Line2D at 0x7fca49069290>]
../_images/02_RandVar_45_1.png

Continuous time random walk: Brownian motion

We have learned about random variables, random walk and have encorunetred a concept of stochastic process on the example of discrete step 1D random walk. Now let us generate the prototypical stochasic process in continuous time; the brownian motion. Brownian motion was first discoverd by a botanist Brown who noticed that a pollen in solution undergo erratic and incessant motion. To simulate brownian motion we take the continuous time limit of random walk and approximate dsiplacements of our particle as normally distributed (binomial->normal, time step->continuous time)

\[x(t+dt)-x(t)=N(0,\sqrt{2D dt})\]

We assume we have started at position \(\mu=0\) and our variance is given by \(\sigma^2=2Dt\) Where D is the diffusion coefficnets which is related to parameters of discree random walk as shown in the lecture.

\[x(t+dt)=x(t)+\sqrt{2D dt} \cdot N(0,1)\]

In the last step we re-wrote brownian motion equation in a convenient way by shifting normally distributed radnom variable by \(\mu\) and scaling it by \(\sigma\)

$\(N(\mu, \sigma^2)=\mu+\sigma N(0,1)\)$.

def brown(T, N=1, dt=1, D=1):
    
    """
    Creates 3D brownian path given:
    time T 
    N=1 trajecotires
    dt=1 timestep
    D=1 diffusion coeff
    returns np.array with shape (N, T, 3)
    """
    
    nT = int(T/dt) # how many points to sample
    
    dR = np.sqrt(2*D*dt) * np.random.randn(N, nT, 3) # 3D position of brownian particle
    
    R = np.cumsum(dR, axis=1) # accumulated 3D position of brownian particle
    
    return R

Brownian motion

  • Below we proceed to simulate continuous random walk in 1D-3D

  • We will plot trajecotires and distributions of brownian particle using interactive plotting via holoviews/plotly interface.

import holoviews as hv
hv.extension('plotly')
R=brown(T=10000, N=1000)
print(R.shape)
(1000, 10000, 3)
hv.extension('plotly')

plots = []
for i in range(5):
    
    plot = hv.Curve(R[i,:,0], 'Time', 'X')
    plots.append(plot)

hv.Overlay(plots) + hv.HoloMap({i: hv.Distribution(R[:,i,0]) for i in range(1,1000, 20)}, kdims='Time')
hv.extension('plotly')

plots = []
for i in range(5):
    
    plot = hv.Curve(R[i,:,:2], 'X', 'Y')
    plots.append(plot)
    
hv.Overlay(plots).collate() 
hv.extension('plotly')

plots = []
for i in range(10):
    
    plot = hv.Path3D(R[i,:,:], label='3D random walk').opts(width=600, height=600, line_width=5)
    plots.append(plot)
    
hv.Overlay(plots)

Diffusion Equation

The movement of individual random walkers \(\leftrightarrow\) density of walkers \(\rho(\vec{r},t)\)

Diffusion equation:

Formulated empirically as Fick’s laws

\[\frac{\partial\rho}{\partial t} = \mathcal{D}\nabla^2\rho\]
  • This is a 2nd order PDE! Unlike equations of motion diff eq shows irreersibile behaviour

  • This one exactly solvable. But in general reaction-diffusion PDEs difficult to solve analytically.

  • Can solve numerically by writing derivatives as finite differences!

  • Can also simulate via random walk!

  • Diffusion coefficient \(D\), Units \([L^2]/[T]\)

Important special case solution (here written in 1d):

\[\rho(x,t) = \frac{1}{\sqrt{2\pi \sigma(t)^2}}\exp\left(-\frac{x^2}{2\sigma(t)^2}\right),\]

where \(\sigma(t)=\sqrt{2{D}t}\)

  • density remains Gaussian

  • Gaussian becomes wider with time

  • check that this is indeed a solution by plugging into the diffusion equation!

def sigma(t, D = 1):
    return np.sqrt(2*D*t)

def gaussian(x, t):
    return  1/np.sqrt(2*np.pi*sigma(t)**2) * np.exp(-x**2/(2*sigma(t)**2)) #
import ipywidgets as widgets

@widgets.interact(t=(0.01,1,0.001))
def diffusion(t=0.001):
    
    x = np.linspace(-4, 4, 1000)
    
    plt.plot(x, gaussian(x, 0.01), '--', color='orange')
    
    plt.plot(x, gaussian(x, t), lw=3, color='green')